Clustering Categorical Data

نویسندگان

  • Zhang Yi
  • Ada Wai-Chee Fu
  • Chun Hing Cai
  • Pheng-Ann Heng
چکیده

Dynamical systems approach for clustering categorical data have been studied by some authors [1]. However, the proposed dynamic algorithm cannot guarantee convergence, so that the execution may get into an in nite loop even for very simple data. We de ne a new conguration updating algorithm for clustering categorical data sets. Let us consider a relational table with k elds, each of which can assume one of a number of possible values. We prepresent each possible value in each possible eld by an abstract node. Let us denote the nodes by vi(i = 1; ;m). A con guration is an assignment of weight wi for each node vi. The new algorithm is de ned as follows.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

ارائه یک الگوریتم خوشه بندی برای داده های دسته ای با ترکیب معیارها

Clustering is one of the main techniques in data mining. Clustering is a process that classifies data set into groups. In clustering, the data in a cluster are the closest to each other and the data in two different clusters have the most difference. Clustering algorithms are divided into two categories according to the type of data: Clustering algorithms for numerical data and clustering algor...

متن کامل

خوشه‌بندی خودکار داده‌های مختلط با استفاده از الگوریتم ژنتیک

In the real world clustering problems, it is often encountered to perform cluster analysis on data sets with mixed numeric and categorical values. However, most existing clustering algorithms are only efficient for the numeric data rather than the mixed data set. In addition, traditional methods, for example, the K-means algorithm, usually ask the user to provide the number of clusters. In this...

متن کامل

A Simple Yet Fast Clustering Approach for Categorical Data

Categorical data has always posed a challenge in data analysis through clustering. With the increasing awareness about Big data analysis, the need for better clustering methods for categorical data and mixed data has arisen. The prevailing clustering algorithms are not suitable for clustering categorical data majorly because the distance functions used for continuous data are not applicable for...

متن کامل

Using Categorical Attributes for Clustering

The traditional clustering algorithms focused on clustering numeric data by exploiting the inherent geometric properties of the dataset for calculating distance functions between the points to be clustered. The distance based approach did not fit into clustering real life data containing categorical values. The focus of research then shifted to clustering such data and various categorical clust...

متن کامل

Clustering Numerical and Categorical Data

Clustering is an important technique for data mining which allows us to discover unknown relationships in our data sets. Clustering algorithms that use metrics based on the natural ordering of numbers cannot be applied to categorical (non-numerical) data. In this tutorial we will review the main methods for numerical data clustering (K-Means, Hierarchical Clustering and Fuzzy CMeans) and then s...

متن کامل

The "Best K" for Entropy-based Categorical Data Clustering

With the growing demand on cluster analysis for categorical data, a handful of categorical clustering algorithms have been developed. Surprisingly, to our knowledge, none has satisfactorily addressed the important problem for categorical clustering – how can we determine the best K number of clusters for a categorical dataset? Since categorical data does not have the inherent distance function ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000